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NOTE: this is a revised version of a paper presented at 
ISIPTA 2009. This is not a major revision; the purpose 
was to correct some problems with the original paper. 
The main point is that Example 1 has been changed (the 
original example was flawed). Also, the definition of B L 
and some inequalities in the proof of limits in Theorem 
4 have been corrected. 



Abstract 

This paper presents concentration inequalities and laws of 
large numbers under weak assumptions of irrelevance, ex- 
pressed through lower and upper expectations. The results 
are variants and extensions of De Cooman and Miranda's 
recent inequalities and laws of large numbers. The proofs 
indicate connections between concepts of irrelevance for 
lower/upper expectations and the standard theory of mar- 
tingales. 

1 Introduction 

This paper investigates concentration inequalities and laws 
of large numbers under weak assumptions of "irrelevance" 
that are expressed using lower and upper expectations. The 
starting point is the assumption that, given bounded vari- 
ables Xi, . . . , X n , we have: 



for each i £ [2, n], variables Xi, . . . , 
are epistemically irrelevant to Xi. 



(1) 



Epistemic irrelevance of variables X\, . . . , to Xi ob- 
tains when (26j Def. 9.2.1] 



ElfiX^AiX^)} = E[f(Xi)] 



(2) 



for any bounded function / of Xi and any nonempty event 
A(Xi;i—i) defined by variables Xi-i-i, where the func- 
tional E is an upper expectation (Section 0. Here and in 
the remainder of the paper we simplify notation by using 
A ::, Ibr.V, V,. 



A judgement of epistemic irrelevance can be inter- 
preted as a relaxed judgement of stochastic independence, 
perhaps motivated by a robustness analysis or by disagree- 
ments amongst a set of decision makers. Alternatively, 
one might consider epistemic irrelevance as the appropriate 
concept of independence when expectations are not known 
precisely. 

De Cooman and Miranda have recently proven a num- 
ber of inequalities and laws of large numbers that also 
deal with judgements of irrelevance expressed through 
lower/upper expectations 0. De Cooman and Miranda's 
weak law of large numbers implies that, given Assumption 
©, for any e > 0, 



f/4 



> 1 - 2e c 



where Bi is such that supXi — inf Xi < Bi, and 



Moreover, De Cooman and Miranda's results and Assump- 
tion (Q]) imply a two-part strong law of large numbers: for 
any e > 0, there is N e N + such that for any TV' £ N + , 

P^Bn e [N, N + N'] : Xl >]I + e\ < e, 



P^3n e [N,N- 



N'] 



Eti Xi 



< 



n - 



< e. 



This law of large numbers corresponds to a finitary version 
of the usual strong law of large numbers [9|; the focus on 
a finitary law is justified by the fact that De Cooman and 
Miranda do not assume countable additivity. If countable 
additivity holds, the finitary strong law of large numbers 
implies convergence of empirical means with probability 
one El Sec. 5.3]. 

To obtain their results, De Cooman and Miranda as- 
sume, following Walley's theory of lower previsions, that 
all variables are bounded, and that conglomerability (and 
consequently disintegrability) holds. These assumptions 
are discussed in more detail later. 
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The present paper derives laws of large numbers by ex- 
ploiting concentration and martingale inequalities that are 
adapted to the setting of lower/upper expectations. These 
results use either Assumption ([TJ or the weaker assump- 
tion that, for each i G [2, n] and any nonempty event 
A(X w _i), 

E[X i \A{X 1 . A _ l )] = E[Xi] 

__ and _ (3) 

E[Xi\A(X 1:i -i)} =E[Xi]. 

Several results for bounded variables presented in this pa- 
per are basically implied by De Cooman and Miranda's 
work. Regarding bounded variables our contribution lies 
in offering tighter inequalities and alternative proof tech- 
niques that are more closely related to established methods 
in standard probability theory (in particular, close to Ho- 
effding's and Azuma's inequalities). In Section |4] we of- 
fer more significant contributions as we lift the assumption 
of boundedness for variables, and use martingale theory to 
prove laws of large numbers under elementwise disintegra- 
bility. 

2 Expectations, disintegrability, and 
zero probabilities 

In this section we present notation and terminology. 
Throughout the paper we assume that an expectation func- 
tional E maps bounded variables into real numbers, and 
satisfies: 

(1) if a < X < (3, then a < E[X] < (3; 

(2) E[X + Y}= E[X] + E[Y}; 

where X, Y are bounded variables and a, (3 are real num- 
bers (inequalities are understood pointwise). 

From such an expectation functional, a. finitely additive 
probability measure P is induced by P(A) = E[A] for any 
event A; note that A denotes both the event and its indicator 
functionQ 

Given a set of expectation functionals, the lower and 
upper expectations of variable X are respectively 

E[X] = inf E[X] , E[X] = sup E[X] . 

Lower and upper probabilities are defined similarly using 
indicator functions. Given an event A, a conditional ex- 
pectation functional is constrained by P{A) = 
E[XA], If we have a set of expectation functionals, then a 
set of conditional expectation functionals given an event A 
is produced by elementwise conditioning on event A (that 
is, each expectation functional is conditioned on A). 



1 A probability measure defined on a field completely characterizes an 
expectation functional on bounded functions that are measurable with re- 
spect to the field and vice- versa [26. Theorem 3.2.2]. 



2.1 Disintegrability and factorization 

We will employ an assumption of disintegrability in our 
proofs; namely, 

E[W] < E[E[W\Z]] (4) 

for any W > 0, Z > of interest, where W and Z may 
stand for sets of (non-negative) variables. Note that disin- 
tegrability can fail for a single finitely additive probability 
measure over an infinite space ll6l [TOl ; that is, there is a 
finitely additive probability measure P such that 

E P [W] > Ep[E P [W\Z]\ . 

One way to obtain disintegrability is to restrict attention 
to simple variables; that is, variables that take on finitely 
many distinct values. In particular, indicator functions are 
simple variables; hence simple variables suffice to express 
convergence of relative frequencies, and our results apply 
then. 

Another way to obtain disintegrability for every proba- 
bility measure P is to adopt countable additivity [ 1 ]. That 
is, assume that if 

A x D A 2 d . . . 
is a countable sequence of events, then 

n, Ai = implies lim P(A n ) = 0. (5) 

n — >oo 

This assumption says that if HiA^ = 0, then 
liiriyj^oo P(A n ) = for every possible probability mea- 
sure. 

A third way to obtain disintegrability is simply to im- 
pose it. One may consider disintegrability a "rationality" 
requirement. 

• The theories of coherent behavior by Heath and Sud- 
derty OH and by Lane and Sudderth 1 19 1 follow this 
path by axiomatizing the strategic measures of Du- 
bins and Savage ifTTI . and thus prescribing proba- 
bility measures that disintegrate appropriately along 
some predefined partitions. This would be sufficient 
for our purposes, but there are limitations in the ap- 
proach as summarized by Kadane et al lfl6l . The dis- 
integrability of strategic measures has actually been 
used to prove various laws of large numbers in a 
finitely additive setting IfTTI . 

• Another scheme that imposes disintegrability is Wal- 
ley's theory of lower previsions; in that theory, Ex- 
pression (0]l is a consequence of axioms for "co- 
herent" behavior. This is the path adopted by De 
Cooman and Miranda, who consequently have Ex- 
pression (|4]l at their disposal. 
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When disintegrability holds, recursive application of 
Expression (0]l yields: if fi(Xi) > for i £ {1, . . . ,n}, 
then 



E 



l[fi(Xi) 



i=i 



< E 







...E 


E 



\X\-.r 



Assumption (Q} then implies an inequality we use later: for 
bounded and nonnegative functions, 



E 



n 

<n s [/i( x *)] 



(6) 



2.2 Zero probabilities, full conditional mea- 
sures and weak irrelevance 

It should be noted that the definition of epistemic irrele- 
vance (Expression (f2]i) does not contain any clause con- 
cerning zero probabilities. Indeed, Walley's theory of lower 
previsions follows de Finetti in adopting full conditional 
measures, and in this setting Expression © can be imposed 
without concerns about zero probabilities. Recall that a full 
conditional measure P : B x (£>\0) — > Sft, where B is a 
Boolean algebra, is a set-function that for every nonempty 
event C satisfies |fl0l[T8l : 

(1) P(C|C) = 1; 

(2) P(A\C) > for all A; 

(3) P(A U B\C) = P(A\C) + P{B\C) for all disjoint A 
and B; 

(4) P(A n B\C) = P(A\B n C) P{B\C) for all A and B 
such that B n C ^ 0. 

Full conditional measures are not adopted in the 
usual Kolmogorovian theory, and if countable additivity 
is adopted and conditioning is defined through Radon- 
Nykodym derivatives, it may be impossible to satisfy the 
axioms for full conditional measures [23, 24]. Thus there 
are are some differences between epistemic irrelevance (at 
least as defined by Walley) and the usual Kolmogorovian 
set-up, besides the obvious set-valued/point-valued distinc- 
tion. 

Suppose that one wishes to deal with sets of probabil- 
ity measures and associated lower/upper expectations, but 
chooses to adopt the Kolmogorovian set-up for each mea- 
sure. That is, each measure satisfies countable additivity 
and thus disintegrability, and conditioning is left undefined 
when the conditioning event has probability zero. It might 
seem reasonable to amend Expression (O as follows: 



E[f(X l )\A(X 1:l _ 1 )} = 



E[f(Xi)] 
if P(A(X 1:l 



(7) 



0)>o. 



This condition is a natural for theories that do not define 
conditioning on events of lower probability zero, such as 



Giron and Rios' theory [13|. Alas, this weaker condition 
is really too weak to produce laws of large numbers, as the 
following example shows. 

Example 1 Consider binary variables X±, X2, ■ ■ ■ (values 
and 1). Define events Aq = {X\ = 0, X2 = 0, . . . } and 
Ai = {X\ = \,X% = 1,...}. Consider a convex and 
closed set K of joint distributions for these variables, built 
as the convex hull of three distributions, P\, Pi and P3, as 
follows. 

Distribution P\ simply assigns probability one to A\. 
Distribution P2 assigns probability 6 to Aq and probabil- 
ity 1 — 6 to A\, for some S £ (0, 1). Distribution P3 is 
the product of identical marginals: for any integer n > 0, 
P 3 (Xl =xi,...,X n = x n ) = niLi p 3(Xi = Xi), where 
P 3 (Xi = 1) = 1 - 5. 

For the convex hull of Pi, P2 and P3, Expression 
(0 is satisfied. This conclusion is reached by analyz- 
ing each distribution in turn. For distribution P\, we 
have P\(X\ = 1) = 1 and for any i > 1 we have 
P x {Xi = l\A(X ui -i)) = 1 whenever P( J 4(X 1:i _i)) > 
0. Note that for any event A(X\-i-i): if A\ G A, then 
Pi(A) = 1; if Ax g A, then P\{A) = 0. For distribution 
Pi, PiiXi = 1) = 1 — S for any i > 0. Additionally, 
for any event A(Xi-.i-i) we have PziXi — 1\A) either 
equal to 1 - 5 or 1 whenever P(A) > 0. [If A Y % A, 
then P_(A) = (due to Pi). So suppose A\ C A: If 
A C A, then P 2 (X % = 1\A) = 1-5; if A % A, 
then Pi(Xi = \\A) = 1.] For distribution P3, we have 
Ps(Xi = 1) = 1 — 5 and for any i > 1 we have 
Ps(Xi — \ \A) — \ — 5 for any nonempty event A(X\;i-i). 
In short, for all probability measures in the credal set we 
haveP(X l = 1) e [l-S,l]andP(Xi = l\A(X U i-i)) 6 
[1 - 8, 1] whenever P_{A{X X :i-\)) > 0.. 

The weak law of larger numbers fails because, for any 
e £ (0, 1 - S), 



lim P /i -e< 1 

n — >oo \ — n fi 



< Mi 



r 'n + e 



■ 1-5. 



This follows from the fact that, for any integer 
n > 0, we have Pi (5TJ"=i^/ n= 1) = 1 ai% d 
P2 (EtiXi/n - 1) = 1 - P 2 (^o) = 1-5, and for 
any e > (due to standard weak law of large numbers), 

lim P 3 ( (1 - 8) - e < fl./n < (1 - 5) + e ) = 1. 

\ i=l / 

We might thus consider an alternative to Expres- 
sion ©: 

= E[£(Xi)] (8) 
ifP(A(Xi.i_i)) > 0. 

The concept of irrelevance conveyed by Expression ([8]) 
does lead to Expression ©. To see this, note that for non- 



3 



negative X and Y, we have 



3 Bounded variables 



E[XY] < sup Ep\E[XY\Y}] 
p 

= sup E P [AE[XY\Y] +A C E[XY\Y}] , 
p 

using disintegrability and denning A as the set of all values 
of Y such that P(A C ) = 0. Hence P(A C ) = for every P 
and using Expression ((8): 

< sup£;p[Ars[x|y]l 

p ■ 

= sup Ep [AY 'E[X]\ 
p 

= sup E P [AY]E[X] 
p 

= E[X]supE P [Y] 
p 

= E[X]E[Y). 

[As a digression, note that one might define condi- 
tional expectations as E[X\ A] = in£p : pfA)>a Ep[X\A] 
and E[X\A) — supp.p^^^Q _Ep[JT|j4]. This form of con- 
ditioning has been advocated by several authors ll27l l28ll . 
and it is quite similar to Walley's concept of regular exten- 
sion [26, Ap. J]. For such a form of conditioning, Expres- 
sion ([8]) seems to be the natural definition of irrelevance.] 

In short, more than one combination of definitions and 
assumptions lead to the results presented in the remainder 
of this paper. For instance, Expression © obtains when 
Assumption (Q]) holds and disintegrability holds (because 
all variables are simple, or because countable additivity is 
assumed, or because disintegrability is imposed). Alterna- 
tively, Expression © obtains when Expression ([8]) holds 
for any i G [2, n], any bounded function / of Xi, and any 
event A(-Xi : j_i), and additionally disintegrability holds. 

Similar remarks concerning zero probabilities can be 
directed at Assumption (|3j. We say that weak irrelevance 
obtains when either one of: 

• For any i E [2,n] and any nonempty event 

A(X Ui ^), 

E[X i \A{X 1 .^ l )] = E[Xi] 
and 

E[X i \A(X Ui _ 1 )] = E[X i ] 

[this is Assumption (01, and it requires full condi- 
tional measures]. 

• For any i e [2, n] and any event A(Xi :i _i), 

£[Xi|A(X w _i)] = E[Xi] if P(A(Xi :i _i)) > 
and 

EiXAAiX^)] = E[Xi] if P(A(Xi :i _i)) > 0. 



Take variables X\ , . . . , X n such that sup Xi — inf Xi < Bi 
and define 
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We start by deriving two concentration inequalities. 

3.1 Concentration inequalities 

The following inequality is a counterpart of Hoeffding in- 
equality J8j [T5) in the context of lower/upper expectations; 
it is slightly tighter than similar inequalities by De Cooman 
and Miranda 0. It is interesting to note that the proof is 
remarkably similar to the proof of the original Hoeffding 
inequality. 



Theorem 1 If bounded variables X\ , . . . 
pression ([6]), then ifj n > 0, 

T['f2(X i -'E[X i ])>e \ <e 



, X n satisfy Ex- 



Proof. By Markov inequality, if X > 0, then for any e > 
we have P(X > e) < E[X] /e. Consequently, for s > 0, 
any variable X satisfies 

P(X > e) = P(e sX > e se ) < e- se E[exp(sX)} . 

Using this inequality and Expression ©: 



< e- st E exp S ( X * - 



We now use Hoeffding's result (Expression ( fTTT i) that if 
variable X satisfies a < X < b and E[X] < 0, then 
E[exp(sX)] < exp(s 2 (6 - a) 2 /8) for any s > 0. Thus for 
any P, E P [exp(s(JQ - E[Xi]))] < exp(s 2 B 2 /8), an d 
then E [exp (s(Xi -E[Xi]))] < exp(s 2 P 2 /8). Conse- 
quently, 



P [J2(Xi-E[Xi])>A < 



2 1n/8 < ~2e 2 / ln 



where the last inequality is obtained by taking s = 
4e/7 n . This proves the first inequality in the 
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theorem; the second inequality is proved by tak- 
ing P(TZ=i((-Xi) -E[-Xi}) > e) and noting that 
E[Xi] = -E[-Xi]. □ 

We now move to weak irrelevance and obtain an ana- 
logue of Azuma's inequality 1213. It is again interesting 
to note that the proof is remarkably similar to the proof of 
the original Azuma inequality. De Cooman and Miranda 
(5] Sec. 4.1] show that their inequalities are valid under 
weak irrelevance; the next inequality is slightly tighter than 
theirs. 

Theorem 2 If bounded variables X\ , . . . , X n satisfy weak 
irrelevance and disintegrability (Expression ©J holds, 
then if"f n > 0, 



P\J2(Xi-E[Xi]) < 'A <e 



Proof. Using both Markov's inequality (as in the proof of 
TheoremlTJ and disintegrability, for any s > we get 



(n 
j2s(Xi-E[Xi]) 
i=l , 

ex^J2a(Xi-E[Xi])J \ X Un . 
exJ^s(Xi-E[Xi]) \ h(Xi sn -i) 



< e' st E 



< e~ se E 



E 



< e~ se E 



where 

fr(Xi :n _i) = E[ex^s(X n -E[X n })) I X 1:n _i] . 
Due to weak irrelevance, 

Ep[X n \Xi :n -i] < E[X n \Xi :n -i] = E[X n ] ; 
consequently, for any P, 

Ep[X n — E[X n ] \Xi-.n-i] < 0. 

We now use Hoeffding's result (Expression ( fTTT i) that if 
variable X satisfies a < X < b and E[X] < 0, then 
E[exp(sX)] < exp(s 2 (fe - a) 2 /8) for any s > 0. Thus 
for any P we have 

E P [exp(s{X n -E[X n })) |X lin _i] < exp(s 2 5 2 /8) 



and then < exp(s 2 5 2 /8). Thus 
P\it( x i-E[Xi])>e\ 

exp 

\i=l 

exp( 53 s(Xi -~E[Xi]) j exp( s 2 B 2 /8) 



< e~ se E 



< e~ se E 



< e~ se exp(s 2 J B 2 /8)£; 



expl s i x i ~ E \ x i\) 



These inequalities can be iterated to produce: 



P J2( X - E i X i\) >A< ^ ex P s 2 E ^ 



Finally, by taking s = 4e/7„, 



P^(I r £[Ii])> e < 



vi=l 



The second inequality in the theorem is proved by noting 
that weak irrelevance of X\ , . . . , X n implies weak irrel- 
evance of —Xi, . . . , —X n (as E[Xi] = —E[—Xi]), and 
then by taking P(E?=i((~- x '0 > e )- D 

3.2 Laws of large numbers 

Theorem[T|leads to simple proofs of laws of large numbers 
already stated by De Cooman and Miranda [5|. To start, 
take Assumption ([T). Using subadditivity of upper proba- 
bility and TheoremQ] 

t (f s x ^ + e J u (it x ^ n n - c J < 2e ~^ > 

where as before, \i = (1/n) E*Li — i X i\ m ^ Mn = 

(lAOEJU^iffiy notin g that ^(^) = 1 - T ( AC ) 

for any event A, by including the endpoints of relevant in- 
equalities, and by using ne instead of e: 

P(fi>-e< ^ i=lXi <JZ+e) > 



> l-2e"^ 



where we define B = max^ B L . By taking limits, we obtain 
a weak law of large numbers: 

lim p(u - e < Xl < /!„ + e) = 1. 



An analogue of De Cooman and Miranda's finitary strong 
law of large numbers can be deduced as well from the pre- 
vious inequalities, as follows. Here and in the remainder 
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of the paper, n, N and N' denote positive integers. For all 

e > 0, N > and N' > 0, 



P[ 3n E [N, N + N'] : ^ l=1 Xi >JI+e 



N+N' /™ Y 

n—N ^ 
JV+JV' 

< £ e~ 2nt2/B2 

n=N 



N' 
ra=0 



x _ e 2(Ar' + l)c 2 /-B 2 

i - e - 2e2 /s 2 



Proof. Using subadditivity of upper probability and Theo- 
rem|2] and defining again B = max^ Bi, 

T |E Xi ~ + £ j U fe Xi ~ ~ e j " 

and we obtain the first expression in the theorem. To pro- 
duce the second inequality (strong law), note: 

__/ x^ n Y 

P(3ne [N,N + N'} : ^ t=1 1 >JI+e 



n 



N+N' 



* E^ 

n=N 
N+N' 



>~p+e 



< 



E 



= N 



< 



1 - e ^ 2 / B2 ' 



< 



Consequently, 

p(3n e [N, N + N'] : ^ l=1 X * >JI+e) ,. 



provided that TV is a positive integer such that 

A> -(B 2 /(2e 2 ))lne(l-e- 2e2 / s2 ). 
An analogous argument leads to 

p( 3n e[N,N + N'} : ^ l=1 X% <fi-e] < e. 



By superadditivity of upper probability, we obtain a per- 
haps more intuitive statement of the strong law of large 
numbers: for all e > 0, there is N such that for any N', 

P_(vne[N,N + N'}:[i -e< ^ i=lXl <ft + e) >l-2e, 



thus reproducing De Cooman and Miranda's strong laws. 

We now present a pair of weak/strong laws of large 
numbers under weak irrelevance. De Cooman and Miranda 
prove a similar pair of laws by resorting to their previous re- 
sults on forward irrelevant natural extensions [5 Sec. 4.1]. 
The proof offered now is perhaps more direct, using our 
analogue of Azuma's inequality. 

Theorem 3 If bounded variables X\ , . . . , X n satisfy weak 
irrelevance and Expression (0 holds, then for any e > 0, 

P(u-e< < ft + e ) > 1 - 2e-^/ B " 



and there is N such that for any N' , 
p(vn€[N,N+N']:[x -e< ^ l=lXt <JZ n +e)>l - 2. . 



-2e 2 /B 2 • 



Again, 



P [ 3n E [N, N + N'}: ^ l=1 Xi >JI + e\ > 



provided that A is a positive integer such that 

N> -(B 2 /(2e 2 ))\ne(l-e- 2^2 / B '■ 



This is "half" of the second expression in the theorem; the 
other "half" is proved analogously. □ 

The theorem easily implies the following concise weak 
law of large numbers, by taking limits: 



lim P[u-e< ^= lXl <ft + e 



1. 



4 Laws of large numbers without 
boundedness 

We now consider variables without bounds in their ranges 
under the assumption of weak irrelevance; the resulting 
laws of large numbers are the main contribution of the pa- 
per. We will assume in this section that countable addi- 
tivity holds (Expression ((5])). This assumption of count- 
able addivity implies disintegrability; that is, 2?p[W] = 
Ep[E P [W\Z}} for any P, W and Z. Thus our setup is 
close to the standard (Kolmogorovian) one, where any ex- 
pectation functional is a linear monotone and monotoni- 
cally convergent functional that can be expressed through 
Lebesgue integration. We only depart from the Kol- 
mogorovian tradition in explicitly letting a set of such func- 
tional to be permissible given a set of assessments. 

We will use a sequence of variables {Y n } defined as 
follows: 



Y„ 



Xi — Ep[Xi\Xi.i-i] 
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The key observation is that Y n is a function of all variables 
Xi- n such that 

E P [Y n \X lm ^} = x i - EpiXilXui.tU + 

Ep[X n — Ep[X n \Xi- n -i] |Xi :n _i] 
= K n _i + 

Ep{X n \X 1:n -i] — Ep[X n \Xi :n -i] 
= Y n -i; 

so, {Y n } is a martingale with respect to P. Thus, 

£p[( y «- y «-l) 2 |^l:«-l] 

= J B P [F„ 2 |x 1:n _ 1 ] -2 J Bp[y n _ 1 y n |x 1:n _ 1 ] + r„ 2 _ 1 
= s P [y n 2 |x 1: „_ 1 ] -2y n _iB P [r n |x 1:n _i] + K n 2 _! 
= e p [Y 2 \x 1:n ^] - 2r„_ 1 r„_ 1 + r„ 2 _! 
= £; P [Y 2 |x 1:n _ 1 ] -y£_ x . 

And by taking expectations on both sides and noting that 

Y i -Y i -. 1 =X i -Ep[X i \X 1 . i - 1 ],weg& 

E P [Y 2 ] = E P [(X n - E P [X n \X 1:n ^]f] + Ep [Y 2 _ t ] . 
Iterating this expression, we obtain: 

71 

M Y n] =Y,Ep[(X i -E P [X i \X 1: i- 1 ]) 2 ] . (9) 

z=l 

With these preliminaries, we have: 

Theorem 4 Assume countable additivity. If variables 
X\ , . . . , X n satisfy weak irrelevance, and E[Xi] and 
E[Xi] are finite quantities such that E[Xj\ — -E[Xj] < 5, 
and the variance of any Xi is no larger than a finite quan- 
tity a 2 , then for any e > 0, 

\— n n J e z n 

and there is N > such that for any N' > 0, 

p(\fne\N,N + N'}:n -e< <p n +e\>l - 2e. 

V — n n J 



Consequently, 

Ve>0: lim p(n - e < ^ l=1 Xt <]I n + e 



£(lim sup ( — 1 _ jj n | <0J=1, 



P[ lim inf ( Z " i=1 - -n I > ] = 1. 



n— >oo \ n 



Proof. For a fixed P and for all e > 0, 



P(u * <Mn + e 



\i=l i=l i=l 

(n n 
5^^ P [X i |X 1 . i _ 1 ]-en<5^X i 

<^£ P [X i |X 1:i _ 1 ]+, 

i=i 

(using weak irrelevance) 

= p f_ e < E"=i x » ~ gpfXjji^-i] < £ 



en 



= P{-e < Y n /n < e) 
= P(|V*I < e)- 
Applying Chebyshev's inequality and Expression (O, 

Ep[Y 2 ] 



P(\Y n /n\>e) < 



^2 m 2 



Etl^[(^-^[^|^l:i-l]) 2 ] 



Now write (X 4 - Ep[Xi\Xi :i -i]) 2 as 

((^ - E P [Xi}) + (P P [X] - EpIX^Xx.^x])) 2 , 
and then: 



^^[(X-SpIXIXj^!]) 2 ] 

i=l 

n 

= Y,Ep[{X i -E P [X i }) 2 ] 



i=l 



+2B P [(X < - Ep[X t ])(E P [X t ] - EplXilX^])] 
+Ep[(E P [Xi] - EplXlXi^i]) 2 ] 

<][> 2 +£ 2 

i=l 

+2(£ P [X i ] - EplXilXu-iVEplXi - E P [Xi]\ 

= ±« 2 + s 2 . 

i=l 

Hence 

n 

^^[(Xi-EpIXilXx^i]) 2 ] <n(a 2 +<5 2 ), (10) 
t=i 

and combining these inequalities, we obtain: 

P{\Y n /n\>e) < — , 
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and then 



pressed as {Vj > 1 : \Xj\ < ej}; thus 



<^^ i <Mn + e) >l--o 



for any P, as desired. By taking the limit as n grows with- 
out bound, we obtain 

lim p(u - e < < ji n +e)=l. 



The proof of the strong law of large numbers uses the 
same strategy, but replaces the appeal to Chebyshev's in- 
equality by an appeal to the Kolmogorov-Hajek-Renyi in- 
equality (described in the Appendix), following the proof 
of the strong law of large numbers by Whittle [29 Thm. 
14.2. 3]. So, for a fixed P and for all e > 0, we proceed as 
previously to obtain: 



P[\Jne[N,N + N'} : fj, -e< ^ l = lX <Jt n + , 



>P[\/ne[N,N + N'} :-€< — <€ 



= P^ne[N,N + N'] : \Y n /n\ < e) . 

As {Yn, Yn+i, ■ ■ ■ , Y/v+jy} forms a martingale, we use 
the Kolmogorov-Hajek-Renyi inequality to produce: 

P^ne[N,N + N'} : \Y n /n\ < e) 

>1 Eti^p[(^-^p[^l^-i]) 2 ] 



e 2 N 2 



> 1 - 

> 1 

> 1 
= 1 



N+N' 

- E 

i=N+l 


EpHX, 


— Ep[Xi Xi :i _i]) 2 ] 




e 2 i 2 


a 2 + 5 2 


N+N' 

E 


a 2 + S 2 


e 2 N 


e 2 i 2 



i=N+l 

(using Expression ( fTOb ) 

a 2 + S 2 ^ a 2 + S 2 



e 2 N 



E - 



i=N+l 



e 2 \N + J N ^ 



a 2 +S 2 fl_ 1_ 

N + N 



1 - 2- 



€ 2 N 



Consequently, for integer N > (a 2 +S 2 )/ e 3 , we obtain the 
desired inequality 

p(\/ne[N,N + N']-.fi -e< ^ i=lX <j[ n +e\>l-2e. 



The proof of the Kolmogorov-Hajek-Renyi can be ex- 
tended to an infinite intersection of (decreasing) events ex- 



< e > 1 - 8, 



Ve > : V<5 > : 3N > : 

/ y m , X t - E\XA 

P Vm > N : ^ t=1 1 v —± 

\ m 

and this is equivalent to: 



Ve>0: lim p(vm>N : ?H <e ] : L . 

n~+oo \ m 

As the events in these probability values form an increasing 
sequence, we have, for all e > 0, 



P \3N > : Vm > N 



<e =1. 



Now this is equivalent to Vfc > : P(A k ) = 1, where 
A k = {3N > : Vm > N : (1/m) YZi X i ~ ^[Xi] > 
1/k}, and because P(U k>0 ->A k ) < J2 k>0 P(-^A k ) = 0, 
we have P(Vfc > : A k ) = 1, so 

y^ m A — ~~F\X 1 
P|Vfc>0:3A>0:Vm>A: Z "^ 1 * <e| = 1. 



This is exactly the desired expression 

pflim sup f ^= lX --p ] () ] = 1. 



A similar argument proves the last inequality in the theo- 
rem, starting from: 



Ve > : V<5 > : 3N > : 

P\Vm>N: E ^ lXi ~ mXi] 



> -e > 1 - 8 



5 Discussion 

The concentration inequalities and laws of large numbers 
proved in this paper assume rather weak conditions of epis- 
temic irrelevance. When compared to usual laws of large 
numbers, both premises and consequences are weaker: ex- 
pectations are not assumed precisely known, and conver- 
gence is interval-valued. 

Theorems [T]and |2] and their ensuing laws of large num- 
bers are implied by De Cooman and Miranda's seminal 
work [5] (and their results generalize several previous ef- 
forts 1 12 1). Actually, De Cooman and Miranda start from 
a weaker condition of forward factorization that is implied 
both by Assumption (Q]) and weak irrelevance. The possi- 
ble advantage of our proof techniques for these two theo- 
rems is that they are rather close to well-known methods in 
standard probability theory, such as Hoeffding's inequality 
(it should be noted that De Cooman and Miranda already 



8 



indicate the similarity between their inequalities and Ho- 
effding's). 

The most significant results of the paper employ weak 
irrelevance to produce concentration inequalities (Theorem 
|2| and laws of large numbers (Theorems [3] and @). The lat- 
ter theorem is possibly the most valuable contribution. The 
strategy for most proofs is to translate assumptions of weak 
irrelevance into facts regarding martingales, and to adapt 
results for martingales to this setting. This strategy keeps 
the proof relatively short and close to well-known results 
in probability theory. The connection between lower/upper 
expectations and the theory of martingales seems rather 
natural ll4l l25ll . but the relationship between epistemic ir- 
relevance and martingales does not appear to have been ex- 
plored in depth so far. We note that the basic constraint 
defining martingales (that is, S[Y^|Xi in _i] = Y n -i) is 
preserved by convex combination of mixtures; therefore, 
the study of martingales seems appropriate when one deals 
with convex sets of probability measures — certainly it 
seems less contorted than the analysis through stochastic 
independence, as stochastic independence is not preserved 
by convex combination. 

The proofs presented in this paper need assumptions 
of disintegrability that can be easily satisfied if countable 
additivity is adopted. It is an open question whether similar 
results can be proven without disintegrability, particularly 
when one deals with unbounded variables. 
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A Two auxiliary inequalities 

The following inequality is a simple extension of a basic 
result by Hoeffding flUEl: If variable X satisfies a < 
X < b and E[X] < 0, then for any s > 0, 

£[exp(sX)] < exp(s 2 (6 - a) 2 /8). (11) 

First, the inequality is clearly valid if a — b, or if a = 0, or 
if b < 0. From now on, suppose b > > a. By convexity 
of the exponential function, 

exp(sa;) < ^-^ e sb + \—^e sa for x G [a, b}. 
b — a b — a 

Given monotonicity of expectations and E[X] < 0, 

Mexp(sX)] < — —e sa - -?—e sb = exp(0(s(6 - a))) 
b — a b — a 



for 4>(u) = —pu + log(l — p + pe u ) with p = —a/(b — a) 
(and note that p G (0, 1] in the situation under considera- 
tion). Given that 0(0) = 0'(O) = and <j>"{u) < 1/4 for 
u > (as the maximum of 4>"{u) is 1/4, attained at e" = 
(1 — p)/p), we can use Taylor's theorem as follows. For 
some v G (0, u), (f>(u) = 0(0) + u(f>'(0) + (u 2 /2)0"(u) < 
(w 2 /8) and consequently <p(s(b — a)) < s 2 (b — a) 2 /8. By 
putting together these inequalities, we obtain Expression 

CD. 

We now review the Kolmogorov-Hajek-Renyi inequal- 
ity, almost exactly as proved by Whittle 11291 ; this is pre- 
sented just to indicate the role of (elementwise) disinte- 
grability in the derivation. Let {Xi} be a martingale with 
Xq = 0, and let {e^} be a sequence = €q < e% < . . . ; the 
inequality is 

P(Vi G [1, n] : < e,) > 1 - ± EMiZliz^l . 

i=i e * 

To prove this inequality, define A n = G [l,n] : \Xj\ < 
ej}. Using £j = Xi — and again denoting an event 

and its indicator function by the same symbol, we have 

P(A n ) - E P [A n ] = E P [A n ^{\X n \ < e n }} 

> E P [A n ^{l-Xl/el)} 
(as{|X| <e} > l-X 2 /e 2 ) 

= E P [A n ^(l-(X^_ 1+ C)/e 2 n )] 
(by the martingale property) 

> E P [A n _ 2 (l - X 2 _iALi)] ~ Ep[&/£] 
(as e n _! < e n and 

{\X\ < e}(l - X 2 /e 2 ) > (1 - X 2 /e 2 )). 

Iteration of the last inequality yields the result. Note that it 
was necessary to apply disintegrability of P when applying 
the martingale property (that is, elementwise disintegrabil- 
ity is used). 
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